The Opioid Epidemic
Introduction (Tamer)
The Opioid Crisis is truly that - a crisis. Over the past 20 years,
Data
Prescription Rate Data (Tamer)
Overdose Rate Data (Chris)
Analysis
Prescription Rate vs. Overdose Rate (Chris)
- Show maps
- Ask Questions about whether one causes the other, etc.
Regression
# scatter / regression between prescription rate and overdose rate
mod <- lm(age_adjusted_rate ~ prescription_rate, data = full_data)
msummary(mod)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.56323 2.96818 2.548 0.01409 *
## prescription_rate 0.10475 0.03524 2.972 0.00461 **
##
## Residual standard error: 5.298 on 48 degrees of freedom
## Multiple R-squared: 0.1554, Adjusted R-squared: 0.1379
## F-statistic: 8.835 on 1 and 48 DF, p-value: 0.004609
ggplot(data = full_data, aes(x = prescription_rate)) +
geom_histogram(bins = 15)
ggplot(data = full_data, aes(x = prescription_rate, y = age_adjusted_rate)) +
geom_point() +
geom_abline(intercept = 7.56323, slope = 0.10475)
Clustering (Sean)
Text explaining why k-means
Determining the Optimal K
Explain silhouette score
silhouette_score <- function(k){
km <- kmeans(full_data[, 2:3], centers = k, nstart = 20)
score <- cluster::silhouette(km$cluster, dist(full_data[, 2:3]))
mean(score[, 3])
}
k <- 2:5
avg_sil <- sapply(k, silhouette_score)
optimal_k <- which(as.data.frame(avg_sil)$avg_sil == max(avg_sil)) + 1
optimal_k
## [1] 2
km <- kmeans(full_data[, 2:3], centers = optimal_k, nstart = 20)
full_data <- mutate(full_data, cluster = as.character(km$cluster))